AITopics | style information

Collaborating Authors

style information

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning Frequency-Adapted Vision Foundation Model for Domain Generalized Semantic Segmentation

Neural Information Processing SystemsMar-22-2026, 00:29:01 GMT

The emerging vision foundation model (VFM) has inherited the ability to generalize to unseen images.Nevertheless, the key challenge of domain-generalized semantic segmentation (DGSS) lies in the domain gap attributed to the cross-domain styles, i.e., the variance of urban landscape and environment dependencies.Hence, maintaining the style-invariant property with varying domain styles becomes the key bottleneck in harnessing VFM for DGSS. The frequency space after Haar wavelet transformation provides a feasible way to decouple the style information from the domain-invariant content, since the content and style information are retained in the low-and high-frequency components of the space, respectively. To this end, we propose a novel Frequency-Adapted (FADA) learning scheme to advance the frontier.Its overall idea is to separately tackle the content and style information by frequency tokens throughout the learning process.Particularly, the proposed FADA consists of two branches, i.e., low-and high-frequency branches. The former one is able to stabilize the scene content, while the latter one learns the scene styles and eliminates its impact to DGSS. Experiments conducted on various DGSS settings show the state-of-the-art performance of our FADA and its versatility to a variety of VFMs.Source code is available at \url{https://github.com/BiQiWHU/FADA}.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Style Adaptation and Uncertainty Estimation for Multi-Source Blended-Target Domain Adaptation

Neural Information Processing SystemsFeb-17-2026, 00:16:05 GMT

Blended-target domain adaptation (BTDA), which implicitly mixes multiple sub-target domains into a fine domain, has attracted more attention in recent years.

artificial intelligence, domain adaptation, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks

Hyeonseob Nam, Hyo-Eun Kim

Neural Information Processing SystemsFeb-12-2026, 02:43:11 GMT

Neural Information Processing Systems http://nips.cc/

bin, normalization, style transfer, (15 more...)

Neural Information Processing Systems

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

pixels (PixelCNN) that is conditioned on a latent code, and the recognition path uses a generative adversarial network (GAN) to impose a prior distribution on the

Neural Information Processing SystemsNov-21-2025, 11:22:19 GMT

In this paper, we describe the "PixelGAN autoencoder", a generative autoencoder Both networks are jointly trained to maximize a variational lower bound on the data log-likelihood. Section 2.1, we show that by imposing a Gaussian distribution on the latent code, we can achieve a global vs. local decomposition of information.

artificial intelligence, machine learning, pixelgan autoencoder, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

9e5f7743a4e753452f73d32da1190202-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 11:29:23 GMT

adaptation, domain adaptation, information, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Learning Frequency-Adapted Vision Foundation Model for Domain Generalized Semantic Segmentation

Neural Information Processing SystemsMay-27-2025, 12:12:59 GMT

The emerging vision foundation model (VFM) has inherited the ability to generalize to unseen images.Nevertheless, the key challenge of domain-generalized semantic segmentation (DGSS) lies in the domain gap attributed to the cross-domain styles, i.e., the variance of urban landscape and environment dependencies.Hence, maintaining the style-invariant property with varying domain styles becomes the key bottleneck in harnessing VFM for DGSS. The frequency space after Haar wavelet transformation provides a feasible way to decouple the style information from the domain-invariant content, since the content and style information are retained in the low- and high- frequency components of the space, respectively. To this end, we propose a novel Frequency-Adapted (FADA) learning scheme to advance the frontier.Its overall idea is to separately tackle the content and style information by frequency tokens throughout the learning process.Particularly, the proposed FADA consists of two branches, i.e., low- and high- frequency branches. The former one is able to stabilize the scene content, while the latter one learns the scene styles and eliminates its impact to DGSS. Experiments conducted on various DGSS settings show the state-of-the-art performance of our FADA and its versatility to a variety of VFMs.Source code is available at \url{https://github.com/BiQiWHU/FADA}.

domain generalized semantic segmentation, learning frequency-adapted vision foundation model, style information, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Few-shot Semantic Encoding and Decoding for Video Surveillance

Cheng, Baoping, Zhang, Yukun, Wang, Liming, Xie, Xiaoyan, Fu, Tao, Wang, Dongkun, Tao, Xiaoming

arXiv.org Artificial IntelligenceMay-13-2025

With the continuous increase in the number and resolution of video surveillance cameras, the burden of transmitting and storing surveillance video is growing. Traditional communication methods based on Shannon's theory are facing optimization bottlenecks. Semantic communication, as an emerging communication method, is expected to break through this bottleneck and reduce the storage and transmission consumption of video. Existing semantic decoding methods often require many samples to train the neural network for each scene, which is time-consuming and labor-intensive. In this study, a semantic encoding and decoding method for surveillance video is proposed. First, the sketch was extracted as semantic information, and a sketch compression method was proposed to reduce the bit rate of semantic information. Then, an image translation network was proposed to translate the sketch into a video frame with a reference frame. Finally, a few-shot sketch decoding network was proposed to reconstruct video from sketch. Experimental results showed that the proposed method achieved significantly better video reconstruction performance than baseline methods. The sketch compression method could effectively reduce the storage and transmission consumption of semantic information with little compromise on video quality. The proposed method provides a novel semantic encoding and decoding method that only needs a few training samples for each surveillance scene, thus improving the practicality of the semantic communication system.

artificial intelligence, machine learning, video, (18 more...)

arXiv.org Artificial Intelligence

2505.07381

Country: Asia > China (0.29)

Genre: Research Report > New Finding (1.00)

Industry: Commercial Services & Supplies > Security & Alarm Services (0.92)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (0.97)
Information Technology > Communications > Networks (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection

Qin, Hongda, Lu, Xiao, Wei, Zhiyong, Cao, Yihong, Yang, Kailun, Chen, Ningjiang

arXiv.org Artificial IntelligenceMay-13-2025

Generalizing an object detector trained on a single domain to multiple unseen domains is a challenging task. Existing methods typically introduce image or feature augmentation to diversify the source domain to raise the robustness of the detector. Vision-Language Model (VLM)-based augmentation techniques have been proven to be effective, but they require that the detector's backbone has the same structure as the image encoder of VLM, limiting the detector framework selection. To address this problem, we propose Language-Driven Dual Style Mixing (LDDS) for single-domain generalization, which diversifies the source domain by fully utilizing the semantic information of the VLM. Specifically, we first construct prompts to transfer style semantics embedded in the VLM to an image translation network. This facilitates the generation of style diversified images with explicit semantic information. Then, we propose image-level style mixing between the diversified images and source domain images. This effectively mines the semantic information for image augmentation without relying on specific augmentation selections. Finally, we propose feature-level style mixing in a double-pipeline manner, allowing feature augmentation to be model-agnostic and can work seamlessly with the mainstream detector frameworks, including the one-stage, two-stage, and transformer-based detectors. Extensive experiments demonstrate the effectiveness of our approach across various benchmark datasets, including real to cartoon and normal to adverse weather tasks. The source code and pre-trained models will be publicly available at https://github.com/qinhongda8/LDDS.

information, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.07219

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis

Luo, Dan, Ma, Chengyuan, Li, Weiqin, Wang, Jun, Chen, Wei, Wu, Zhiyong

arXiv.org Artificial IntelligenceApr-15-2025

With the advancement of speech synthesis technology, users have higher expectations for the naturalness and expressiveness of synthesized speech. But previous research ignores the importance of prompt selection. This study proposes a text-to-speech (TTS) framework based on Retrieval-Augmented Generation (RAG) technology, which can dynamically adjust the speech style according to the text content to achieve more natural and vivid communication effects. We have constructed a speech style knowledge database containing high-quality speech samples in various contexts and developed a style matching scheme. This scheme uses embeddings, extracted by Llama, PER-LLM-Embedder,and Moka, to match with samples in the knowledge database, selecting the most appropriate speech style for synthesis. Furthermore, our empirical research validates the effectiveness of the proposed method. Our demo can be viewed at: https://thuhcsi.github.io/icme2025-AutoStyle-TTS

information, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2504.10309

Country: Asia > China (0.15)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Filters

Collaborating Authors

style information

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

0266e33d3f546cb5436a10798e657d97-Paper.pdf

Learning Frequency-Adapted Vision Foundation Model for Domain Generalized Semantic Segmentation

Style Adaptation and Uncertainty Estimation for Multi-Source Blended-Target Domain Adaptation

Batch-Instance Normalization for Adaptively Style-Invariant Neural Networks

pixels (PixelCNN) that is conditioned on a latent code, and the recognition path uses a generative adversarial network (GAN) to impose a prior distribution on the

9e5f7743a4e753452f73d32da1190202-Paper-Conference.pdf

Learning Frequency-Adapted Vision Foundation Model for Domain Generalized Semantic Segmentation

Few-shot Semantic Encoding and Decoding for Video Surveillance

Language-Driven Dual Style Mixing for Single-Domain Generalized Object Detection

AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis